智能论文笔记

AutoPV: Automated photovoltaic forecasts with limited information using an ensemble of pre-trained models

Stefan Meisenbacher , Benedikt Heidrich , Tim Martin , Ralf Mikut , Veit Hagenmeyer

分类：机器学习

2022-12-13

Accurate PhotoVoltaic (PV) power generation forecasting is vital for the efficient operation of Smart Grids. The automated design of such accurate forecasting models for individual PV plants includes two challenges: First, information about the PV mounting configuration (i.e. inclination and azimuth angles) is often missing. Second, for new PV plants, the amount of historical data available to train a forecasting model is limited (cold-start problem). We address these two challenges by proposing a new method for day-ahead PV power generation forecasts called AutoPV. AutoPV is a weighted ensemble of forecasting models that represent different PV mounting configurations. This representation is achieved by pre-training each forecasting model on a separate PV plant and by scaling the model's output with the peak power rating of the corresponding PV plant. To tackle the cold-start problem, we initially weight each forecasting model in the ensemble equally. To tackle the problem of missing information about the PV mounting configuration, we use new data that become available during operation to adapt the ensemble weights to minimize the forecasting error. AutoPV is advantageous as the unknown PV mounting configuration is implicitly reflected in the ensemble weights, and only the PV plant's peak power rating is required to re-scale the ensemble's output. AutoPV also allows to represent PV plants with panels distributed on different roofs with varying alignments, as these mounting configurations can be reflected proportionally in the weighting. Additionally, the required computing memory is decoupled when scaling AutoPV to hundreds of PV plants, which is beneficial in Smart Grids with limited computing capabilities. For a real-world data set with 11 PV plants, the accuracy of AutoPV is comparable to a model trained on two years of data and outperforms an incrementally trained model.

translated by 谷歌翻译

Smart Data Representations: Impact on the Accuracy of Deep Neural Networks

Oliver Neumann , Nicole Ludwig , Marian Turowski , Benedikt Heidrich , Veit Hagenmeyer , Ralf Mikut

分类：机器学习

2021-11-17

深度神经网络能够解决许多具有较少工程努力和更好的性能的复杂任务。但是，这些网络通常使用数据进行培训和评估，而无需调查其表示，即〜使用数据的形式。在本文中，我们通过能量时间序列预测分析了数据表示对深神经网络性能的影响。基于示例性数据表示的概述，我们选择四个示例性数据表示，并使用两个不同的深神经网络架构和真实的能量时间序列上的三个预测视野进行评估。结果表明，根据预测地平线，相同的数据表示可以对深神经网络的准确性产生正面或负面影响。

translated by 谷歌翻译

Large Language Models with Controllable Working Memory

Daliang Li , Ankit Singh Rawat , Manzil Zaheer , Xin Wang , Michal Lukasik , Andreas Veit , Felix Yu , Sanjiv Kumar

分类：自然语言处理 | 人工智能 | 机器学习

2022-11-09

Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. While many downstream applications provide the model with an informational context to aid its performance on the underlying task, how the model's world knowledge interacts with the factual information presented in the context remains under explored. As a desirable behavior, an LLM should give precedence to the context whenever it contains task-relevant information that conflicts with the model's memorized knowledge. This enables model predictions to be grounded in the context, which can then be used to update or correct specific model predictions without frequent retraining. By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge. In this paper, we undertake a first joint study of the aforementioned two properties, namely controllability and robustness, in the context of LLMs. We demonstrate that state-of-the-art T5 and PaLM (both pretrained and finetuned) could exhibit poor controllability and robustness, which do not scale with increasing model size. As a solution, we propose a novel method - Knowledge Aware FineTuning (KAFT) - to strengthen both controllability and robustness by incorporating counterfactual and irrelevant contexts to standard supervised datasets. Our comprehensive evaluation showcases the utility of KAFT across model architectures and sizes.

translated by 谷歌翻译

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

Manzil Zaheer , Ankit Singh Rawat , Seungyeon Kim , Chong You , Himanshu Jain , Andreas Veit , Rob Fergus , Sanjiv Kumar

分类：机器学习

2022-08-14

大型预估计模型（例如GPT-3）取得了显着的性能，在训练过程中暴露于大量数据上。类似地，将如此大型模型提炼成紧凑的模型以进行有效的部署，也需要大量（标记或未标记的）培训数据。在本文中，我们提出了培训高质量紧凑型模型的教师指导培训（TGT）框架，该模型利用了预验证的生成模型获得的知识，同时避免了大量数据的需求。 TGT利用了教师获得基础数据域的良好表示的事实，该事实通常对应于比输入空间要低得多的尺寸歧管。此外，我们可以使用老师通过采样或基于梯度的方法来更有效地探索输入空间。因此，使TGT对于有限的数据或长尾设置特别有吸引力。我们正式在我们的概括范围内正式捕获了所提出的数据域探索的好处。我们发现TGT可以提高几个图像分类基准以及一系列文本分类和检索任务的准确性。

translated by 谷歌翻译

Learning grammar with a divide-and-concur neural network

Sean Deyo , Veit Elser

分类：自然语言处理 | 机器学习

2022-01-18

我们对无上下文的语法推断实施了分裂和相连的迭代投影方法。与大多数最新的自然语言处理模型不同，我们的方法需要相对较少的离散参数，从而使推断的语法直接可解释 - 可以从解决方案中读取如何构建语法有效的句子。我们方法的另一个优点是，与许多其他模型所采用的数百GB培训数据相比，仅几句句子从几句句子中推断出有意义的语法规则。我们演示了应用我们的方法的几种方法：对单词进行分类并从头开始推断语法，采用现有语法并完善其类别和规则，并采用现有的语法并扩大其词典，因为它在新数据中遇到新单词。

translated by 谷歌翻译

Avoiding Traps in Nonconvex Problems

Sean Deyo , Veit Elser

分类：机器学习

2021-06-09

当约束组是非凸块时，迭代投影方法可能被捕获在非解决方案。有两种参数可用于避免这种行为，这项研究提供了两者的例子。第一种称为HyperParameter的参数包括出现在迭代规则本身的定义中的任何类型的参数。第二种包括在约束集的定义中的度量参数，当要解决的问题时出现的特征具有两个或更多种变量。通过示例，我们展示了适当调整两种参数的重要性，并提供观察到的行为的启发式解释。

translated by 谷歌翻译

Long-tail learning via logit adjustment

Aditya Krishna Menon , Sadeep Jayasumana , Ankit Singh Rawat , Himanshu Jain , Andreas Veit , Sanjiv Kumar

分类：

2020-07-14

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these challenges. Our techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training. Such adjustment encourages a large relative margin between logits of rare versus dominant labels. These techniques unify and generalise several recent proposals in the literature, while possessing firmer statistical grounding and empirical performance. A reference implementation of our methods is available at: https://github.com/google-research/google-research/tree/master/logit_adjustment.Recently, long-tail learning has received renewed interest in the context of neural networks. Two active strands of work involve post-hoc normalisation of the classification weights [

translated by 谷歌翻译

How To Backdoor Federated Learning

Eugene Bagdasaryan , Andreas Veit , Yiqing Hua , Deborah Estrin , Vitaly Shmatikov

分类：

2018-07-02

Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type.Federated models are created by aggregating model updates submitted by participants. To protect confidentiality of the training data, the aggregator by design has no visibility into how these updates are generated. We show that this makes federated learning vulnerable to a model-poisoning attack that is significantly more powerful than poisoning attacks that target only the training data.A malicious participant can use model replacement to introduce backdoor functionality into the joint model, e.g., modify an image classifier so that it assigns an attacker-chosen label to images with certain features, or force a word predictor to complete certain sentences with an attacker-chosen word. These attacks can be performed by a single participant or multiple colluding participants. We evaluate model replacement under different assumptions for the standard federated-learning tasks and show that it greatly outperforms training-data poisoning.Federated learning employs secure aggregation to protect confidentiality of participants' local models and thus cannot prevent our attack by detecting anomalies in participants' contributions to the joint model. To demonstrate that anomaly detection would not have been effective in any case, we also develop and evaluate a generic constrain-and-scale technique that incorporates the evasion of defenses into the attacker's loss function during training. ! "#$%" train & % '() * '()! +%$,-##.

translated by 谷歌翻译